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Measurements and data analysis have proved very 
effective in the study of the Internet's physical fabric and 
have shown heterogeneities and statistical fluctuations 
extending over several orders of magnitude. Here we 
analyze performance measurements obtained by the PingER 
monitoring infrastructure. We focus on the relationship 
between the Round-Trip-Time (RTT) and the geographical 
distance. We define dimensionless variables that contain 
information on the quality of Internet connections finding 
that their probability distributions are characterized by a 
slow power-law decay signalling the presence of scale-free 
features. These results point out the extreme heterogeneity 
of the Internet since the transmission speed between differ- 
ent points of the network exhibits very large fluctuations. 
The associated scaling exponents appear to have fairly 
stable values in different data sets and thus define an 
invariant characteristic of the Internet that might be used in 
the future as a benchmark of the overall state of "health" 
of the Internet. The observed scale-free character should be 
incorporated in models and analysis of Internet performance. 

The Internet is a self-organizing system whose size has already 
scaled five orders of magnitude since its inception. Given the 
extremely complex and interwoven structure of the Internet, 
several research groups started to deploy technologies and in- 
frastructures aiming to obtain a more global picture of the In- 
ternet. This has led to very interesting findings concerning the 
Internet maps topology. Connectivity and other metrics are 
characterized by algebraic statistical distributions that signal 
fluctuations extending over many length scales |2|, ^, ^ |^ . 
These scale-free properties and the associated heterogeneity 
of the Internet fabric define a large scale object whose prop- 
erties cannot be inferred from local ones, and are in sharp 
contrast with standard graph models. The importance of a 
correct topological characterization of the Internet in routing 
protocols and the parallel advancement in the understanding 
of scale-free networks Q have triggered a renewed interest 
in Internet measurements and modeling. Considerable efforts 
have been devoted also to the collection of end-to-end per- 
formance data by means of active measurements techniques. 
This activity has stimulated several studies that, however, fo- 
cus mainly on individual properties of hosts, routers or routes. 
Only recently, an increasing body of work focuses on the per- 
formance of the Internet as a whole, especially to forecast fu- 
ture performance trends m, H]. These measurements pointed 
out the presence of highly Heterogeneous performances and 
it is our interest to inspect the possibility of a cooperative 
"emergent phenomenon" with associated scale-free behavior. 

The basic testing package for Internet performance is the 
original PING (Packet InterNet Groper) program. Based on 
the Internet Control Message Protocol (ICMP), Ping works 
much like a sonar echo-location, sending packets that elicit 



a reply from the targeted host. The program then mea- 
sures the round-trip-time (RTT), i.e. how long it takes 
each packet to make the round trip. Organizations such 
as the National Labora tory for Applied Network Research 
( jittp: / /moat .nlanr. not / ) and the Cooperative As sociation for 
Internet Data Analysis ( |ittp://www. caida.org/ ) use PING- 
like probes from geographically diverse monitors to collect 
RTT data to hundreds or thousands of Internet destinations. 
Our Internetwork Performance Measurement (IPM) project 
curre ntlv participates in the PingER mon itoring infrastruc- 
ture (http://www-iepm.slac.stanford.edu/). PingER was de- 
veloped by the Internet End-to-end Performance Measure- 
ment (lEPM) group to monitor the end-to-end performance 
of Internet links. It consists of a number of beacon sites send- 
ing regularly ICMP probes to hundreds of targets and storing 
all data centrally. Most beacons and targets are hosts belong- 
ing to universities or research centers; they are connected to 
many different networks and backbones and have a very wide 
geographical distribution, so they likely represent a statisti- 
cally significant sample of the Internet as a whole. 

We have analyzed two years worth of PingER data, going 
from April 2000 to March 2002. We have selected 3353 dif- 
ferent beacon-target pairs, taken out of 36 beacons and 196 
targets. For each pair we have considered the following met- 
rics: the geographic distance of the hosts d (measured on a 
great circle), the monthly average packet loss rate r (the per- 
centage of ICMP packet that does not reach the target point), 
the monthly minimum and average round-trip-times RTTmm , 
and RTTa„, respectively. These data offer the opportunity to 
test various hypotheses on the statistical behavior of Inter- 
net performance. Each data point is the monthly summary 
of approximately 1450 single measurements. The geographic 
position of hosts is known with great accuracy for some sites, 
but in most cases it may be wrong by 10-20km. Consequently, 
we have discarded pairs of sites that are less than this distance 
apart. The end-to-end delay is governed by several factors. 
First, digital information travels along fiber optic cables at al- 
most exactly 2/3 the speed of light in vacuum. This gives the 
mnemonically very convenient value of 1ms RTT per 100km 
of cable. Using this speed one can express the geographic dis- 
tance d in light-milliseconds, obtaining an absolute physical 
lower bound on the RTT between sites. The actual measured 
RTT is (usually) larger than this value because of several fac- 
tors. First, data packets often follow rather circuitous paths 
leading them through a number of nodes that are far from the 
geodesic line between the endpoints. Furthermore, each link 
in a given path is itself far from being straight, often follow- 
ing highways, railways or power lines ||ll|. The combination 
of these factors produces a purely geometrical enhancement 
factor of the RTT. In addition, there is a minimum processing 
delay 5 introduced by each router along the way, of the order 
of 50-250^8 per hop on average, summing up to a few ms for 
a typical path |ll|. This can be significant for very close site 
pairs, but is negligible for most of the paths in the PingER 
sample. On top of this, the presence of cross traffic along the 
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FIG. 1: RTTmin between 2114 host pairs (PingER data set 
of February 2002) as a function of their distance d. Each 
point correspond to a different host pair. The hne indicates 
the physical lower bound provided by the speed of hght in 
transmission cables. It is possible to observe the very large 
fluctuations in the RTTmi„ of different host pairs separated 
by the same distance. For graphical reasons the picture frame 
is limited to 400ms, however, several outliers up to 900ms are 
present in the data set. 



route can cause data packets to be queued in the routers. Let 
Ir be the sum of all processing and queueing delays due to 
the routers on a path. When the traffic reaches congestion, 
tij becomes a very significant part of the RTT and packet loss 
also sets in. We have considered minimum and average values 
of the RTT over one month periods. It is plausible that even 
on rather congested links there will be a moment in the course 
of a month when tu is negligible, so RTTmm can be taken as 
an estimate of the best possible communication performance 
on the given data path, subject only to the intrinsic geomet- 
rical enhancement factor and the minimum processing delay. 
On the other hand, RTTau for a given site pair is obtained by 
considering the average RTT over one month periods. This 
takes into account also the average queueing delay and gives 
an estimate of the overall communication performance on the 
given data path. 

We studied the level of correlation between geographic dis- 
tance and the RTTmi„ and RTTau of source-destination pair. 
In Fig.0 we report the obtained relationship for RTTmin com- 
pared with the solid line representing the speed of light in 
optic fibers at each distance. While it is possible to observe 
a linear correlation of the RTTmin with the physical distance 
of hosts, yet the data are extremely scattered. The RTTat, 
present a qualitatively very similar behavior, and it is worth 
remarking that both plots are in good agreement with simi- 
lar analysis obtained for different data sets [^, ^ While 
several qualitative features of this plot provide insight into 
the geographical distribution of hosts and their connectivity, 
it misses a quantitative characterization of the intrinsic fiuc- 
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FIG. 2: Cumulative distributions, of the round-trip-times 
normalized with the actual distance d between host pairs. The 
linear behavior in the double logarithmic scale indicates 
a broad distribution with power-law behavior, (a) In the 
case of the normalized minimum round-trip-times Tmin, 
the slope of the reference line is —2.0. (b) In the case of 
the normalized average round-trip-times Tav, the reference 
line has a slope —1.5. The insets of a) and b) report the 
distributions obtained for the Gloperf dataset. In both cases 
we obtain power-law behaviors in good agreement with those 
obtained for the PingER data sets (see Tab. I). 



tuations of performances and their statistical properties. 

A more significant characterization of the end-to-end 
performance is obtained by normalizing the latency time 
by the geographical distance between hosts. This defines 
the absolute performance metrics Tmin =RTTmin/rf and 
Tav =KTT av/d which represent the minimum and average 
latency time for unit distance, i.e. the inverse of the over- 
all communication velocity (note that if we measure d in 
light-milliseconds Tmin and Tav are actually dimensionless) . 
These metrics allow us to meaningfully compare the perfor- 
mance between pairs of hosts with different geographical dis- 
tances. The highly scattered plot of Fig. nl, indicates that 



3 




In r 



FIG. 3: Probability density P{r) for the occurrence of 
packet loss rate r on beacon-target pairs transmissions. 
The zero on the x axis corresponds to a 1% rate in packet 
loss. Note that the distribution has a linear behavior in the 
double logarithmic scale, indicating a power law behavior. 
The reference line has a slope —1.2. 



end-to-end performance fluctuates conspicuously in the whole 
range of geographic distances. In particular, looking at col- 
lections of host pairs at approximately the same geograph- 
ical distance, we find latency times varying up to two or- 
ders of magnitude. The best way to characterize the level 
of fluctuations in latency times is represented by the prob- 
ability P{Tmin) and Pijav) that a pair of hosts present a 
given Tmin and Tav, respectively. In contrast with usual ex- 
ponential or gaussian distributions, for which there is a well 
deflned scale, we find that data closely follow a straight line 
in a double logarithmic plot for at least one or two orders of 
magnitude, defining a power-law behavior P(rmin) ~ ^mtiT^'^ 
and Pijav) ~ t'^"^ . In Fig.Q we show the cumulative distri- 
butions Pcum(T) — P{T)dT' obtained from the PingER 
data. If the probability density distribution is a power law 
P(r) ~ 1""", the cumulate distribution preserves the alge- 
braic behavior and scales as Pcum{T) = r"'""^'. In addition, 
it has the advantage of being considerably less noisy than the 
original distribution. From the behavior of FigJ^ a best fit 
of the linear region in the double logarithmic representation 
yields the scaling exponents amin — 3.0 and aav — 2.5. It 
is worth remarking that the presence of a truncation of the 
power law behavior for large values is a natural effect implic- 
itly present in every real world data set and it is likely due to 
an incomplete statistical sampling of the distribution. Power- 
law distributions are characterized by scale-free properties; 
i.e. unbounded fluctuations and the absence of a meaningful 
characteristic length usually associated with the probability 
distribution peak. In such a case, the mean distribution value 
and the corresponding averages are poorly signiflcant, since 
fluctuations are gigantic and there are non negligible probabil- 
ities to have very large Tmm and Tav compared to the average 
values in the whole system. In other words, Internet perfor- 
mances are extremely heterogeneous and it is impossible to 
infer local properties from average quantities. 

The origin of scale-free behavior is usually associated to 
critical cooperative dynamical effects. Critical and scale-free 
behavior has been observed and characterized in queueing 



Data set Q^min Otav {^min) {'Tav^ 



April '00 2.7 ±0.2 2.2 ± 0.2 


3.7 


6.6 


Feb. '01 2.9 ±0.2 2.4 ± 0.2 


3.6 


6.6 


Feb. '02 3.0 ± 0.2 2.5 ± 0.2 


3.1 


5.3 


Gloperf 2.7 ±0.2 2.4 ± 0.2 


5.4 


7.8 



TABLE I: The table shows the improving performances along 
the years of the PingER data sample. As an independent 
check, we report the values obtained from the analysis of the 
data sample of the Gloperf project. 



properties at router interfaces, probably affecting conspicu- 
ously the distribution of Tav It is, however, unclear why scale- 
free properties are observed also in the distribution of Tmin- 
In this case traffic effects should be negligible, and it is well 
known that the the distribution of hop counts between hosts 
has a well defined peak and no fat tails [[10[ . On the contrary, 
we find that minimum latency times are distributed over more 
than two orders of magnitude. Potentially, cables wiggiiness, 
Internet connectivity and hardware heterogeneities might be 
playing a role in the observed performance distribution. 

It is worth remarking that a tendency to improved perfor- 
mance is observed over the two years period of data collec- 
tions. Table I shows that the averages over all the site pairs of 
< Tmin > aud < Tav > decreascs steadily, whereas the expo- 
nents amin and aav increases signalling a faster decay of the 
distribution tails. We can consider the improvement of per- 
formance as the byproduct of the technological drift to better 
lines and routers. On the other hand, the large fluctuations 
present in the Internet performance appear to be a stable and 
general feature of the statistical analysis. In order to have an 
independent check of the PingER results, we have considered 
also the Gloperf data set that was used in We have ex- 
tracted a set of parameter values for each of 650 unique site 
pairs in the sample and analyzed the statistics. These results 
are also reported in Table I. Although the averages depend 
on the specific characteristics of the sample (size, world re- 
gion etc.) and differ significantly from the PingER case, the 
existence of power law tails and the values of the exponents 
seem to be confirmed. These exponents can thus be consid- 
ered as one of the few and sought after reliable and invariant 
properties of the Internet 

Finally, a further evidence of large fluctuations in Internet 
performance is provided by the analysis of the packet loss 
data. Also in this case we are interested in the probability 
P{r) that a certain rate r of packet loss occur on any given 
pair. We have analyzed the monthly average packet loss 
between PingER beacon-target pairs. In Fig. 3 we report 
the probability P{r) as a function of r. The plot shows 
an algebraically decaying distribution that can be well 
approximated by a power-law behavior P{r) ~ r~'' with 
7 = 1.2 ±0.2. The slowly decaying probability of large packet 
loss rate is another signature of the very heterogeneous 
performance of the Internet. The results presented here 
have implications for the evaluation of performance trends. 
Models for primary performance factors must include the 
high heterogeneities observed in real data. Time and scale 
extrapolation for Internet performances can be seriously 
flawed by considering just the average properties. It is likely 
that we will observe in the future an improvement of the 
average end-to-end performance due to increased bandwidth 
and router speed, but the real improvement of the Internet 
as a whole would correspond in reducing the huge statistical 
fiuctuations observed nowadays. On a more theoretical side. 
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the explanation and formulation of microscopic models at 
the origin of the scale-free behavior of Internet performance 
appear challenging, to say the least. 
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